Calculate the slope of the cost function at the current position
If the slope is negative, move to the right
Repeat
Example with only one parameter θ (theta)
and only one local minimum of the cost function J that is the global minimum, where J′=0
start at random position theta in the cost function
while J'(theta)>0:
Calculate the slope of J: J'(theta) at the current position
If slope is negative:
move theta to right by step width alpha
If slope is positive:
move theta to left by step width alpha
the step with α is called learning rate
Gradient
more dimensional derivative
gives us the direction of the steepest incline
which weight θ has the biggest influence on the cost
∇J=∂θ∂J=⎣⎡∂θ0∂J...∂θp∂J⎦⎤
with only one θ ∇J(θ)=dθdJ=J′
Example with a single θ=θ1
Example data
Data points: [4,2] and [4,4] y=[y1y2]=[24], X=[x1,1x2,1]=[44]
We want to fit a line with no intercept y^j=θ1⋅xj,1
A learning curve is a plot of model learning performance over experience or time (e.g., number of iterations of gradient descent or amount of training data).
To really understand how a model behaves, the data must be split in a training an validation set
Training Learning Curve: Learning curve calculated from the training dataset that gives an idea of how well the model is learning
Validation Learning Curve: Learning curve calculated from a hold-out validation dataset that gives an idea of how well the model is generalizing